Mining Stars with FP-Growth: a Case Study on Bibliographic Data

نویسندگان

  • Andreia Silva
  • Cláudia Antunes
چکیده

Traditional data mining approaches look for patterns in a single table, while multirelational data mining aims for identifying patterns that involve multiple tables. In recent years, the most common mining techniques have been extended to the multirelational context, but there are few dedicated to deal with data stored following the multi-dimensional model, in particular the star schema. These schemas are composed of a central huge fact table linking a set of small dimension tables. Joining all the tables before mining may not be a feasible solution due to the usual massive number of records. This work proposes a method for mining frequent patterns on data following a star schema that does not materialize the join between the tables. As it extends the algorithm FP-Growth, it constructs an FP-Tree for each dimension and then combines them through the records in the fact table to form a super FP-Tree. This tree is then mined with FP-growth to find all frequent patterns. The paper presents a case study on bibliographic data, comparing efficiency and scalability of our algorithm against FPGrowth.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Pattern Mining on Stars with FP-Growth

Most existing data mining (DM) approaches look for patterns in a single table. Multi-relational DM approaches, on the other hand, look for patterns that involve multiple tables. In recent years, the most common DM techniques have been extended to the multi-relational case, but there are few dedicated to star schemas. These schemas are composed of a central fact table, linking a set of dimension...

متن کامل

Using a Data Mining Tool and FP-Growth Algorithm Application for Extraction of the Rules in two Different Dataset (TECHNICAL NOTE)

In this paper, we want to improve association rules in order to be used in recommenders. Recommender systems present a method to create the personalized offers. One of the most important types of recommender systems is the collaborative filtering that deals with data mining in user information and offering them the appropriate item. Among the data mining methods, finding frequent item sets and ...

متن کامل

A Novel Method for Selecting the Supplier Based on Association Rule Mining

One of important problems in supply chains management is supplier selection. In a company, there are massive data from various departments so that extracting knowledge from the company’s data is too complicated. Many researchers have solved this problem by some methods like fuzzy set theory, goal programming, multi objective programming, the liner programming, mixed integer programming, analyti...

متن کامل

Gallbladder Segmentation in 2-D Ultrasound Images Using Deformable Contour Methods

o Gallbladder Segmentation in 2-D Ultrasound Images using Deformable Contour Methods M. Ciecholewski o Pattern Mining on Stars with FP-Growth A. Silva, C. Antunes o Non-hierarchical Clustering of Decision Tables toward Rough Set-based Group Decision Aid M. Inuiguchi, R. Enomoto, Y. Kusunoki o An Enhanced Framework Of Subjective Logic For Semantic Document Analysis S. Manna, B. Sumudu. U. Mendis...

متن کامل

Implementation of Web Usage Mining Using APRIORI and FP Growth Algorithms

-----------------------------------------------------------------------ABSTRACT -------------------------------------------------------------Web Usage Mining is the application of data mining techniques to discover interesting usage patterns from Web data, in order to understand and better serve the needs of Web-based applications. Usage data captures the identity or origin of Web users along w...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems

دوره 19  شماره 

صفحات  -

تاریخ انتشار 2011